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Abstract 



The spectral norm of a Boolean function / : {0,1}" — > { — 1,1} is the sum of the absolute 
values of its Fourier coefficients. This quantity provides useful upper and lower bounds on the 
£SJ ■ complexity of a function in areas such as learning theory circuit complexity and communica- 

tion complexity. In this paper, we give a combinatorial characterization for the spectral norm 
of symmetric functions. We show that the logarithm of the spectral norm is of the same order 
of magnitude as r(f) log(n/r(/)) where r(f) = max{r , r\\, and r and are the smallest inte- 
gers less than n/2 such that f(x) or f{x) ■ PARITY (a;) is constant for all x with ^ x i G [ r 0i n ~ r i]- 



' We mention some applications to the decision tree and communication complexity of symmet- 

ric functions. 

(<q ! 1 Introduction 

00 

^ \ The study of Boolean functions / : {0,1}™ — > {—1,1} is central to complexity theory and combi- 

^ ■ natorics as objects of interest in these areas can often be represented as Boolean functions. Fourier 

analysis of Boolean functions provides some of the strongest tools in this study with applications 
to graph theory, circuit complexity, communication complexity, hardness of approximation, ma- 
chine learning, etc. 

In many different settings, Boolean functions with "smeared out" Fourier spectrums have 
higher "complexity". There are various useful ways to measure the spreadness of the spec- 
trum. Some notable ones are the spectral norm ||/||i = J2s\f(^)\ (i- e -' ^ e ^ norm), the 
norm ||/||oo = max^ \ f(S)\, and the Shannon entropy of the squares of the Fourier coefficients 
Hlf 2 } = — f(S) 2 log f(S) 2 . The focus of this paper is on the spectral norm. 

Spectral Norm of Boolean Functions 

As f(S) 2 = 1 for a Boolean function /, it is often useful to view the squares of the Fourier 
coefficients as a probability distribution over the subsets S C [n]. The spectral norm corre- 
sponds to the Renyi entropy of order 1/2 of the squares of the Fourier coefficients, H 1 / 2 [f 2 ] = 

2 log ( Y^s I / 0^) I ) = ^ log 1 1 / 1 1 1 • h provides useful upper and lower bounds on the complexity of a 
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function in settings such as learning theory, circuit complexity, and communication complexity. It 
is particularly useful in the settings where PARITY is considered a function of low complexity. We 
list some of the applications below. 

In the setting of learning theory, the spectral norm is used in conjunction with the Kushilevitz- 
Mansour Algorithm [KM91 1. This algorithm, using membership queries, learns efficiently a concept 
class C where the Fourier spectrum of every function in C is concentrated on a small set of char- 
acters (This set can be different for different functions.). Kushilevitz and Mansour observe that an 
upper bound on the spectral norm implies such a concentration, and obtain: 

If C = {/ : {0, l} n — > { — 1, 1} | ||/||i < s}, then C is learnable with membership queries 
in time poly(ra, s, 1/e). 

Using the above result, they show that functions computable by small size parity decision tree^j] 
are efficiently learnable with membership queries. This is done by observing that a function com- 
putable by a size s parity decision tree satisfies ||/||i < s. This inequality is also interesting since 
it provides a lower bound in terms of the spectral norm on the size of any parity decision tree 
computing /. 

Threshold circuits (i.e., circuits composed of threshold gates) constitute an important model of 
computation (in part due to their resemblance to neural networks), and they have been studied 
extensively. A classical result of Bruck and Smolensky [BS92J states that a function with small 
spectral norm can be represented as the sign of a polynomial with few monomials. This in turn 
implies that functions with small spectral norm can be computed by depth 2 threshold circuits of 
small size. The result of Bruck and Smolensky has found other interesting applications (see for 
example IISB9lTlGHR92llGro99llOS08ll ). 

We now turn our attention to communication complexity. Arguably the most famous conjec- 
ture in communication complexity is the Log Rank Conjecture which states that the deterministic 
communication complexity of a function F : {0, 1}™ x {0, l} n — > {— 1, 1} is upper bounded by 
log c rankMi? where the matrix Mp is defined as Mp[x,y] = F(x,y). Grolmusz |Gro97] makes a 
similar intriguing conjecture for the randomized communication complexity: 

There is a constant c such that the public coin randomized communication complexity 
of F : {0,l} n x {0,l} n -> {-1,1} is upper bounded by log c ||F||i. 

In the same paper, Grolmusz is able to prove a much weaker upper bound of 0(||F[||5(n)) with 
exp (—c5(n)) probability of error. Even this weaker result has interesting applications in circuit 
complexity and decision tree complexity (see [Gro97J for more details). 

Another major open problem in communication complexity is whether the classical and quan- 
tum communication complexity of total Boolean functions / : X X Y — > {—1, 1} (i.e., functions 
defined on all of X x Y) are polynomially related. It is conjectured that this is so and research 
has been focused on establishing it for natural large families of functions. In an important paper 
[Raz03J Razborov showed that the conjecture is true for functions of the form F(x, y) = SYM(x A y) 
where SYM denotes a symmetric function, and x A y is the bitwise AND of x and y. Shi and Zhang 
[SZ09J verified the conjecture for F(x, y) = SYM(x © y) where x © y denotes the bitwise XOR. The 
next big targets are F(x, y) = f(x A y) and F(x, y) = f(x © y) for general /, but handling arbitrary 
/ seems difficult at the moment. 

1 Parity decision trees generalize the usual decision tree model: in every node we branch according to the parity of a 
subset of the variables. 
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A variant of the spectral norm, the approximate spectral norm, is intimately related to the com- 
munication complexity of "xor functions". The e-approximate spectral norm of /, denoted ||/||i )£ , 
is the smallest spectral norm of a function g : {0, l} n — > E such that ||/ — gWoo < e. It is known 
(see for example [LS09]) that log ||/||i, £ lower bounds the quantum bounded error communication 
complexity of f(x y). We expect that the lower bound log ||/||i, e is tight, and that this quantity 
characterizes the communication complexity of xor functions. More discussion on the communi- 
cation complexity of xor functions, and how it relates to this work is given in Section|5j 

This ends our discussion of the use of the spectral norm in learning theory, circuit complexity 
and communication complexity. We conclude this subsection by mentioning a relatively recent 
result that studies the spectral norm of Boolean functions. Green and Sanders [GS08J show that 
every Boolean function whose spectral norm is bounded by a constant can be written as a sum 
of constantly many ± indicators of cosets. This gives an interesting characterization of Boolean 
functions with small spectral norm. 

Fourier Spectrum of Symmetric Functions 

A function / : {0, l} n — > {—1, 1} is called symmetric if it is invariant under permutations of the 
coordinates. In other words the value of f(x) depends only on ^ Xi (i.e., f(x) = f(y) whenever 
Ylti x i = J2i Hi)- Symmetric functions are at the heart of complexity theory as natural functions 
like AND, OR, MAJORITY, and MOD m are all symmetric. They are often the starting point of in- 
vestigation because the symmetry of the function can be exploited. On the other hand, they can 
also have surprising power. In several settings, functions such as PARITY and MAJORITY represent 
"hard" functions. Given their central role, it is of interest to gain insight into the Fourier spectrum 
of symmetric functions. 

There are various nice results related to the Fourier spectrum of symmetric functions. We cite a 
few of them here. A beautiful result of Paturi |Pat92] tightly characterizes the approximate degree 
of every symmetric function, and this has found many applications in theoretical computer science 
llRaz03llBBC + 01llShe09lldW08llSheTTI Kolountzakis et al. I1KLM+09H studied the so called minimal 
degree of symmetric functions and applied their result in learning theory. Shpilka and Tal [ST11J 
later simplified and improved the work of Kolountzakis et al. Recently, O'Donnell, Wright and 
Zhou [OWZ11J verified an important conjecture in the analysis of Boolean functions, the Fourier 
Entropy /Influence Conjecture, in the setting of symmetric functions. In fact we make use of their 
key lemma in this paper. 

1.1 Our Results and Proof Overview 

We give a combinatorial characterization of the spectral norm of symmetric functions. For x E 

{0, l} n , define \x\ = f J2 x i- F° r a function / : {0, l} n — > {—1, 1}, let ro and r\ be the minimum 
integers less than n/2 such that f(x) or f(x) • PARITY(x) is constant for x with \x\ E [ro,n — 

r{\. Define r(f) = f max{ro,ri}. We show that log ||/||i is of the same order of magnitude as 

r{f)\og(n/r{f)): 

Theorem 1.1 (Main Theorem). For any symmetric function f : {0, l} n — > {—1, 1}, we have 

Iog||/||i = e(r(/)bg (^y)) 
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whenever r(f) > 1. Ifr(f) < 1, then \\f\\i = 0(1). 



As an application, we give a characterization of the parity decision tree size of symmetric func- 
tions. As mentioned in Section[TJ a parity decision tree computes a boolean function by querying 
the parities of subsets of the variables. The size of the tree is simply the number of leaves in the 
tree. 

Corollary 1.2. Let f : {0, l} n — > { — 1, 1} be a symmetric function. Then the parity decision tree size of f 

is 2 e(r(/)log(n/r (/)))_ 

The proof of this corollary is presented in SectionH] Note that the lower bound also applies in 
the case of the usual decision tree size (where one is restricted to query only variables). Decision 
tree size is an important measure in learning theory; algorithms for learning decision trees effi- 
ciently is of great interest both for practical and theoretical reasons. One of the most well-known 
and studied problems is whether small size decision trees are efficiently learnable from uniformly 
random examples. 

As a second application, using the protocol of Shi and Zhang [SZ09, Proposition 3.4], and 
the observation that ||-F||i = ||/||i when F(x,y) = f(x © y), we verify Grolmusz's conjecture 
mentioned earlier in Section[T]in the setting of symmetric xor functions. 

Corollary 1.3. Let f : {0, l} n -> {— 1, 1} be a symmetric function and let F : {0, l} n x {0, l} n -> 
{ — 1, 1} be defined as F(x, y) = fix © y). Then the public coin constant error randomized communication 
complexity of F is upper bounded by 0(log 2 H-FHi). 

We now give an outline for the proof of Theorem ll.il The upper bound is quite straightforward 
and is given in Lemma 13.11 The lower bound is handled in two different cases: when r(f) is 
bounded away from n/2 (Lemma I3.3|) and when r{f) is close ton/2 (Lemma I3.5|) . 

We refer to the Fourier spectrum of / restricted to the sets S C [n] of size k as the k-th level 
of the Fourier spectrum. Note that for a symmetric /, we have f(S) = f(T) whenever \S\ = \T\. 
Therefore the Fourier spectrum is maximally spread out in each level. The overall strategy for the 
lower bound is to show an appropriate lower bound on the £2 mass of the Fourier spectrum on a 
middle level. Middle levels have many Fourier coefficients, and therefore contribute significantly 
to the spectral norm provided there is enough £2 mass on them. An important tool in our analysis 
is the use of certain discrete derivatives of /. Identify {0, 1}" with F2 and let ex, . . . , e n denote the 

def 

standard vectors in F?>. For i / j, define fij(x) = f(x + ej + ef) — f(x). We observe that 

E E t4] =8^|5|(n-|5|)/(5) 2 . 

The quantity on the LHS, and therefore the RHS, can be lower bounded using r(f) (Lemma I3.2D . 
As the coefficient \S\(n — \S\) increases as |5| approaches n/2, we are able to give a lower bound 
on the £2 mass of the Fourier spectrum on the middle levels. This approach gives tight bounds for 
r(f) bounded away from n/2, but not for a function such as MAJORITY. 

To handle functions / with r(f) close to n/2, we use ideas from [OWZ11J. The main lemma of 
|OWZTT| states that the first derivatives of a symmetric function are noise sensitive. We observe 
that this is also true for the derivatives fj. This allows us to derive the inequality 

\S\(n - \S\)f(S)\p\ s \ + //HSI) < 8 _ ^ \S\)f(Sf, 
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where p = (1 — c/n). The quantity p' 5 ' + p n ~\ s \ is decreasing in |5| for |5| < n/2. Thinking of c as 
a large constant, we see that the dampening of the middle levels with p' 5 ' + p n ~\ s \ decreases the 
value of the sum significantly. From this, we can lower bound the £2 mass of the middle levels. 
Note that if J2s \^\( n ~ \S\)f(S) 2 ^ s sma U to begin with (r(/) is small), the above inequality is 
not useful. On the other hand if r(f) is large, Yls \S\( n ~ \S\)f{S) 2 is large, and the strategy just 
described gives good bounds. 



2 Preliminaries 

We view Boolean functions / : {0, l} n — > { — 1, 1} as residing in the vector space {/ : {0, l} n — > C}. 
If we view the domain as the group F^, we can appeal to Fourier analysis, and express every 
/ : {0,1}™ — > C (uniquely) as a linear combination of the characters of F n . That is every function 
/ : F n — > C can be written as / = J2sc[n] f{S)xs, where the characters xs ar e defined as xs '■ % h-> 
(— l)E l6 s a; » ) and f(S) G C are their corresponding Fourier coefficients. Since the characters form 
an orthonormal basis for {/ : {0, 1}™ ->■ C}, we have f(S) = (/, xs) = E x [f(x)xs(x)] . 

For a Boolean function /, we define Wfc[/] = X)|s|=fc \f(S)\ 2 - We simply use when / is 
clear from the context. For a symmetric function, we often write f(k) for f(x) with J2i x i = k ar >d 
k G [n\. We use h to denote the binary entropy function h(a) = — alog(a) — (1 — a) log(l — a). 
We will use the following simple estimates for binomial coefficients (See |MU05, Lemma 9.2]): Let 
a € [0,1] such that an is an integer. Then 

an , v 



and 



If a € [0, 1/2] is arbitrary, then 



k=0 v 



^nh(a) 



n + 1 \an 



<("_)■ (2) 



</ » \< 2 nh(a) (3) 



n(n + 1) \L cm J 

The following fact is also easy and classical. For every constant c > 0, there exists a constant 
C > such that for any n > 1, 

n \ „ 2 n 



Definition 2.1. For any / : {0, l} n -4 R, we define 

R(f)= £ |5|(n-|S|)/(5) 2 . 
SC[n] 

For a G FJ? , we define the derivative of / : FJ — >• R in the direction a as 

A a / : x ■-> /(z + a) - /(x). 
Let ei, . . . , e n denote the standard vectors in Fg, and let / : {0, l} n — >■ R. For all i / j, define 

fij = A ei+e /. (5) 
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Lemma 2.2. For every f : {0, l} n — >• R, we have 



Y,V[f*}=8R(f). 



Proof. We have 



M x ) = Y.^ S ^(x)(xs(e l + e d ) - 1) = ^ -2/(5)xs(a 
5 S:|Sn{i,i}|=l 

which by Parseval's identity implies 

e [#] = E 4 /» 2 - 

Sf:[Sfn{tJ}|=l 

Summing over all pairs i ^ j, we obtain 

#j SC[n] 



□ 



3 Proof of Theorem [LI 

As mentioned earlier the upper bound is proved in Lemma 13.11 The proof of the lower bound 
is divided into two parts: Lemma [3.31 handles the case where r is bounded away from n/2 and 
Lemma 1331 the case when r is close to n/2. 



3.1 Upper Bound 

Lemma 3.1. For alln > 1 and every symmetric function f : {0, l} n — > { — 1, 1}, 

log||/||i<2T(/)log(n/r(/)) + 3. 

Proof. By definition of ro and t\, there exists a function p G {—1, 1, —PARITY, +PARITY} such that 
f(k) = p(k) for all k G [ro, ri]. By linearity of the Fourier transform, we have for any S C [n], 



fc=0 |x|=fc 
^ ro— 1 n 

ff<s) + 5sE(/(*)-p(*))£xs(*) + «s E (/(*) - P(*)) E 



Thus, 



fc=0 [x[=As k=n— n+1 |x|=fc 



l/( S )l<l«5)| + ±£20 + ± £ 2 (») 

fc=0 v 7 fc=n-n+l V 7 

<\p(S)\+2 ± . 
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For the last inequality, we used CQ). Summing over all subsets SC [n], we get 

H/lli < 1 + 2(2 /l(r °/ n ) n + 2 /l(ri / n ) n ) < 1 + 4 • 2 h(r / n ) n . 
As h(t) < -2tlogt wheni < 1/2, we obtain log ||/||i < 3 + 2r log (n/r). □ 

3.2 Lower Bound 

We start by making some simple observations. 

Lemma 3.2. Let f : {0, l} n — > {—1, 1} be a symmetric function, and define r = r (f) and r\ = ri(f). 
Then 

R(f) > ((n - r + l)(n - r„) ( " J + (n - n + l)(n - n) ( " J ) 2~". (6) 
Moreover, assuming that f (s) = 1 for all s £ {r , . . . , n — ri}, we have 

<4 (:) + e(;) V- (?) 

5^0 \s<r V 7 s<ri v 7 / 

Proof. Define as in (O. As / is symmetric, we only need to consider 7i2- 
E [fl 2 ] = E X 3.^ n ~ • (A 2 2 (00x 3 . . . x„) + /i 2 2 (01a;3 . . .x n ) + /^(lOxs . . . x n ) + /^(lk 3 • • • s„)) 



^- Cj Z 3 ...£„ 



(7(00x3 ■ ■ • O - /(11X3 . . . x n )) 2 + (/(11X3 . . . x n ) - 7(00x3 . . . x n )y 



>I(7 n - 2 V2-(-2). 4+ f n-2 |. rM) . 4 



2 \ yro — 1/ yn — ri — 1 

(n - r + l)(ra - r ) f n ^ | (n - r 1 + l)(n - r t ) ( n 1 < .-,-„ 



n(n — 1) \ro — 1/ n(n — 1) \r\ — 1, 

Inequality (0) follows by applying Lemma 1221 

In order to establish inequality I0, we show a lower bound on the principal Fourier coefficient 
of 7: 



which implies that 



\s<ro ^ 7 s>n— n / 

/» 2 >i-4-(e(:)+e(i))2-» 

\s<ro ^ 7 s<ri ^ 7 / 



□ 



3.2.1 Lower Bound: r < n/2 

Lemma 3.3. For every symmetric function f : {0, l} n — > {—1, 1} zint/z r = r(f), 



log 11/11 I ><■>( (1-^—^) Tlog( ,»//•)). 



Proof. Observe that we can assume without loss of generality that f(s) = 1 for all s e {ro, . . . , n — 
ri}. In fact, to handle the case / = — 1 or / = ±PARITY in [ro, n — n], it suffices to multiply the 
function by —1 or by ±PARITY, respectively. This does not affect the spectral norm of the function. 

We prove the statement by showing that a significant portion of the 1% mass of / sits in the 
middle levels from m to n — m for a well-chosen m depending on r(/). 



Define a = Scl < 1/2 and a x = r -^-. We also let m = n/2 ■ (1 - y/4a - 6a 2 , + 4ag) 



and 



mi 



n/2 • (1 - y/Aai - Qa{ + laj) 



By LemmaEl we have £ fc>0 W k < 4 • (£ s<ro (™) + £ s<n (")) 2 ~ n . Let U k and V k be so that 
W fe = U k + V k and £ fe>0 Uk < 4 • 2 ~" E s<ro (?) and £ fc>0 V k < 4 • 2- £ s<n (?) 2~ n . Our objective 
is now to obtain a lower bound on Ylk=mo fc(n — k)U k + J2k=ml k( n ~ fyVk using Lemma 13721 



n—mo n—m\ 



k(n-k)U k + Y k(n-k)V k = R(f) - ^ k(n - k)U k - ^ k{n - k)V k 

k=mo k=mi fc^[mo,n-mo] k^[m\,n—m\\ 

>{n- r )(n - r + 1) ( " J 2" n - (m - l)(n - m + 1)4 • 2~ n ^ 

' s<r 

+ (n-r 1 )(n-r 1 + l)( U _ \ 2~ n - (mi - l)(n - m x + 1)4 • 2~ n ^ (") ■ ( 8 ) 

\ r i / s<ri V 5 / 

Define 4, = (n - r )(n - r + 1)(^ J 2"" - (m - l)(n - m + 1)4 • 2" n £ s<ro (")/ an d let Ai be 
its analogue for ri so that the right hand side of © equals Ao + Ax. 

Observe that (?) = ^ (^), and £± < = ^ for a < r - 1. Thus 



A > 



( J2 n ( (n - a n - l)(n - o n) - 4(m - l)(n - m + 1)- 77- r 

V r o - 1/ V 1 - «o/(l - « ) 

^ 1) 2 ~" f^ 1 - a °) 2 - (! - a °) n " 4 ( m ° - l)(n - mo + l)^r||) 

> x ) 2-" (n 2 ((1 - a ) 2 - (1 - (4a - 6a 2 + ^l))^^) " U " «o)n) 

= ^ r J l _^j2- n (l-a ){n 2 ((l-a )-(l-2a + 2a 2 )) -n) 

= Ljl. 1) " a o) M 1 " 2a o) n2 " n ) ■ 

Analogously, we have 

Ax > ( n _ } 2~ n {l - ai ) (ax(l - 2a x )n 2 - n) . (10) 
We now assume that ro > ri. Observe that we then have tuq < mi. Combining ((HJ) and (O, we get 



(9) 



n—mo n—rriQ 



,2 



n- W * ^ 2 fc ( n " ^ ( r _ij 2 ""( 1 " «o) («o(l " 2a )^ 2 - n) 

k=mo k=mo ^ " ' 
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Note that for symmetric functions ||/||i = YJk=o V it) W k, and thus 



> 



£ 

k=mo 



V 



n 



m 



n— mo 



fc=mo 



^ ( , | n \ / n \ 2 _ n (1 - og) (g (l - 2« )n 2 - n) 

v m y v^o - 1/ ^ 2 



> 



n (1 - op) (ap(l - 2a )n 2 - n) 

2 



fr 



^ V[n/2(1 - V4a - 6a 2 , + 4ag 
Using (O and 10, we obtain 

2 „( ft (i_i V 4a -6ag + 4ag) + M«o)-l) (1 _ ^ _ ^2 _ n ) 



(11) 



> 



n(n + l) 2 



n- 



As a result 



1 (1 - a ) (a (l - 2a )n 2 - n) 



log H/lli > f Q - ^4a -6a 2 +4aj] J + h(a ) - 1 J + | lo b . : , ( ^ + . )2 

Claim 1. T//ere exzsfs a constant c > swc/i that for every ao G (0, 1/2), 
1 1 



2 2 



4ao — 6a 2 , + 4a[j^ + ft(ao) — 1 > c(l — 2ao) • ao ■ log(l/ao). 



(12) 



Proof. Using the inequality \h(x2) — h(xi)\ < h(x2 — x\) which holds for every < x\ < X2 < 1, 
we have 



1 1 

2 ~ 2 



M - - ^a/4oo - 6a 2 , + 4^ + h(a ) - 1 > /i(«o) - ^ 77 V 4a ° ~ 6a o + 4a o 



By looking at the Taylor expansion, it is easy to see that there exists an e > 0, such that for every 

ao G [0, e] U [| — e, |] we have 

/i(a ) - h Q-\/ 4ao - 6«q + 4a o) > ^(1 - 2 «o) • "o • log(l/a )- 

On the other hand, there exists a constant c e > such that when a £ ( e ) 1/2 — e), both ft(ao) — 
h \ h-\/AaQ — 6a 2 , + 4a|]^ and the right-hand side of ((12)) belong to [c e , 1]. Taking c = f l/c e finishes 
the proof. □ 

Using this claim, we obtain 

i imi >s rt n \ 1/1/ \ n , 1 i (l-«o) (ao(l-2a )n 2 -n) 
log H/lli > c(l - 2a ) • a log(l/a ) • - + - log n 3 (n + l) 2 ' 

This proves the desired result provided r(f) is larger than some constant. Next we handle 
small (constant) values of r(f). We start with the case r(/) = 1. In this case, it is easy to see that 
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|i = 0(1). Next, we consider r(f) = 2. Let gk(x) 
we have for 5/0, 



-1 iff \x\ = J2i Xi = k. For the function g\, 



9i 



1*1=1 



i=l 



Hence, 



91 l 



-(n-|S|-|5|) 



n 

n 2 v— \ 
1 - 2— + — > 

on on / / 



-2(n-2|5|) 



2" 2 n ^ \ k 

k=l 



In - 2k\ 



e(v^), 



by observing that a constant fraction of the probability mass of the binomial distribution lies in 
the interval [n/2 — 2^/n,n/2 — y/n\. Similarly, one can show that \\gi + = 6(y / n). All other 

functions with r(/) = 2 are obtained from these two functions by adding functions go or g n and 
by multiplying by a constant or the parity function. 

We now consider the case r(f) > 3, but constant. We perform an analysis similar to the proof 

of Lemma l3~3l We can assume that tq > r%. We take mo = n/2(l — \fbao — 6q!q) . As in (©, we 

obtain the bound 



A n > 



rjl. 1 ) 2 ~™( 1 " «o)(2a n 2 - n). 



Hence, the analogue of inequality ([TT]> becomes 



> 



n 



n 



jn J \r - 1 



_ n (l ~ «o)(2a n 2 - n) 



> 



n 



n/2(l — a/5«o — 6oq) 



n 



(1 - a )(2a n 2 - n) 



But 



n/2(l - V 5 «o -6a^) = n/2 - 8( v / n) and thus {^ n/2{1 _ J 5ao _ 6a 2 ^) = tt(2 n /^E) (see in- 



equality Q). As a result, 



l > n 



> 



1 / n M 



n \ao?V n 



n 



r - 1 



ir 



-3/2 



which proves the lemma. 



□ 
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3.2.2 Lower Bound: r « n/2 

For the case r « n/2, we use a result of [OWZ11J that states that the derivative of a symmetric 
Boolean function is noise sensitive. Here, we use the noise sensitivity of the derivative fy. The 
following lemma is an analogue of [OWZ11 , Theorem 6]. 

Lemma 3.4. Let f be a symmetric Boolean function and be defined as in Then for p = 1 — c/n, we 
have 

s v s 

for any c € [1, n\. Summing over all i,j with i ^ j, we get 

8j2\S\(n - |5|)/(5)V S| < 4= ■ 8R(f)- (14) 

Proof. The proof is the same as the proof of [OWZ11 , Theorem 6] except that we use f\j instead of 
the derivative. We have 

X^-csyy^E* [&(z)e„ [fMw 

s 

where x, y are p-correlated uniform random variables taking values in {0, 1}™. Note that we can 
write for any x 

|Ey [fi2(y)\x] | = \B y:i ... yn [(P [ yi y 2 = 00\x] - P [ yi y 2 = U\x]) (/(lly 3 ■■■Vn)- /(00y 3 • • • Vn)) \x\\ 
< \E y ,...y n [/(lly 3 ...Vn)- /(00y 3 ■ ■ ■ y n )\x\\ . 

To find an upper bound for this expression, it suffices to replace the use of [OWZH , Lemma 1] by 
the following claim. 

Claim 2. Let E = {i 6 [m] : i = mod 2} and O = {i € [m] : i = 1 mod 2}. Let pi, . . . ,p m be 
a non-negative unimodal sequence and g ; [m] — > { — 1,0,1} with the property that the sets 5 _1 (1) n E 
and 1) n E are interleaving, and the sets <7 _1 (1) n O and g~ l (— 1) n O are interleaving. Then 

I YdLiPi9(i)\ < 2max{pi}. 

To pr ove the claim, we simply write | YljLiPi9{i)\ < I Y,i&oPi9 Wl + I 52ieEPi9(i)\- Now 
[OWZ11 , Lemma 1] implies that each term is upper-bounded by m&x{pi}. □ 

We are now ready to prove the following result. 

Lemma 3.5. There exists a constant 7 < 1/2 such that for any symmetric Boolean function f with r(f) > 
jn, we have log ||/||i = fl(n). 

Proof. Let p = 1 — c/n where c is a constant chosen later, and let n be large enough so that p > 1/2. 
We apply (O to g = f / • PARITY: 

^2\S\(n-\S\)g(S) 2 p\ s U^.R(g). 

Note that PARITY = X[ n ] which shows f([n] \ S) = g(S) for all S, and in particular R(g) = R(f). 
So we can rewrite the above inequality as 

£ \S\(n - \S\)f(Sfp n -W < 4= ■ «(/)■ (I 5 ) 
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Summing ((14)) and ((T5|) , we get 

2 |5|(n - |S|)/(5) 2 (1 - ,1*1 - p"-!*!) > (l - -A=) (16) 

Let /3 < 1/2 be a positive constant to be chosen later. We have 

£ \S\{n-\S\)f{Sf{p\ s \+p n -\ s \)> \S\(n-\S\)f(S) 2 (p^ + p^ n ) 

\S\<Pn \S\</3n 

> Y \S\(n-\S\)f(S) 2 (l/2-e~^ + l/2-e~ c ^). 

\S\</3n 

For the first equality, we used the fact that p' 5 ' + p n ~\ s \ is decreasing in \S\ for \S\ < n/2. For the 
second inequality, we used the inequality (1 — c/n)^ n > e~ c " /2 when 1 — c/n > 1/2. Similarly, we 
have 

Y \S\(n-\S\)f(S) 2 (p\ s \+p n -\ s \) > ]T |5|(n- |S|)/(5) 2 (e- c ^/2 + e -M/2). 

[S|>(l-/3)n |5[>(l-/3)n 



Summing the two inequalities, we obtain 



|S|£(/3n,(l-/3)n) |5|^(/3n,(l-/3)n) 



Combining this with (Tl6"|) , we obtain 

£ ^(n-^D/^Cl-pl 5 !-^-! 5 !) 

/3n<|S|<(l-/3)n 

= £ \S\(n - \S\)f(S) 2 (l - P W - p n -W) - J2 l 5 K n " \ S \)f(S)\l - P lSl - P n ~ lsl ) 



> 



S |S|£(/9n,(l-/3)n) 

;i - -^)R(f) - (1 - - e-^/2) £ |5|(n - \S\)f(S)' 
Wire 

V \Smi3n,(l-f})n) 



As e~ c/3 /2 + e"^ 1 "' 3 )/ 2 < ^ this leads to 

Y \S\(n - \S\)f(S) 2 (l - - p n ~\ s \) > [e-^12 + e~< 1 -^ /2 - -J=) R(f). 



/3n<\S\<(l-l3)n 

Consequently, 

i 2 
4 



j8n<|5|<(l-j8)n 

By picking c = 10 4 and /3 = lCT 4 ln2, we have e~ c/3 +e~ c(1 ~ /3) _ _*L > ^. We conclude that 

E/»n<fc<(i-j8)n W k > and thus 



1 — 

fc=0 



10n 2 ' 
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Using it follows that 

'A' " ° (\/G")(r-l) 2 "") " " (^""-'"f" + I)" 1 ) . 

where a = (r — l)/n. If a is such that h(a) > 1 — h((3)/2, we obtain the desired bound log ||/||i = 
n(n). □ 

4 Proof of Corollary [L2 

We start by observing that we can assume that f(x) is constant whenever \x\ G [r , n — ri]. In fact, 
if this is not the case, then / • parity(x) will be constant when \x\ G [ro, w — ri]. But /(a;) can be 
computed from / • PARITY(x) using only one query to PARITY(x), which multiplies the size of the 
tree by at most 2. In the remainder of the proof, we assume f(x) is constant for |x| G [ro, n — r{\. 

We start by proving the lower bound. It is simple to prove that ||/||i is a lower bound on the 
parity decision tree size of / [KM91, Lemma 5.1]. For completeness, we provide a sketch of a 
proof. As all the possible inputs that lead to some leaf L have the same value for /, we can write 
/ as a sum over all leaves of the tree f(x) = J2l f(L)lL{x), where the function 1l takes value 1 
if the input belongs to the leaf L and is otherwise. By linearity of the Fourier transform and the 
triangle inequality, we have ||/||i < J2l Now observe that the inputs corresponding 

to L (that we also call L) are inputs that satisfy some parity conditions on subsets belonging to 
some subspace S. Then, we have 1l(S) = ±^ for any S G S. Note that the number of such 
subsets is 2 n /\L\. But if S <£ S, then Y.xaL Xs(x) = 0. It follows that ||l2||i = 1 and that ||/||i is a 
lower bound on the size of the tree. 

Using Theorem 11.11 this proves the lower bound stated in Corollary 11.21 except in the case 
where r(f) = 1. For this case, observe that we can assume that a leaf at depth d corresponds to 
2 n ~ d possible inputs; see e.g., [KM91, Lemma 5.1]. So we have at most two input bit strings that 
have a value for / that is different from the value / takes when x G [ro(f),n — ri(/)]. This proves 
that the depth of the tree is at least n — 1 and completes the proof of the lower bound. 

For the upper bound, we give a decision tree of size at most 4( ro ™^) + 4( r ?«) for computing 
/. We start by considering a complete binary tree of depth n. Level i of the tree corresponds to 
querying the i-th input bit Xj. The number of leaves of the tree is 2 n . Clearly, one can compute any 
function using such a tree. We are going to use the values ro(/) and r\(f) to remove unnecessary 
nodes from the tree. Note that each node at level i can be labeled by a bit string of length i. We 
remove all the nodes that have ro ones and at least r\ zeros, and the nodes that have n zeros and at 
least ro ones, together with all their children. All of these nodes correspond to inputs x for which 
|x| G [ro, n — ri], so the value of / is a constant that only depends on /. 

It now remains to compute the number of leaves of the constructed decision tree. The number 
of leaves at a level i < n is if i < ro + r\ and ( r Lj) + d-i) ^ * — r o + r i- At level n, we have 
all the remaining nodes that can have at most ro ones or at most n zeros, thus at most (") + ("J 
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leaves. Thus, the total number of leaves is at most 
^ Vr - lj + \ri - J + W + \ri J ~ \r - l) + \ ri - l) + \r ) + \n 

t=ro+n «=ro— 1 i=ri — 1 





i 











n\ 




/ n 




+ 




TV 







= 2 

We can then obtain the stated result by 10 and the fact that h{x) < —2x log x for x E (0, 1/2]. 



5 Conclusion and Future Work 

A natural next step is to extend Theorem ll.ll to approximate spectral norm. Indeed this would have 
interesting implications. Recall that the e-approximate spectral norm of a Boolean function / is the 
smallest spectral norm of a function g with 1 1 / — 5 1 1 oo 

< e, i.e., for all x, \f(x) — g(x)\ < e. Trivially 
||/||i, e is smaller than ||/||i. We conjecture that it cannot be much smaller. 

Conjecture 5.1. For all symmetric functions f : {0, l} n — > {±1}, 

iog||/||i = e*(iog||/|| 1)1/3 ) 

where Q* suppresses O (log n) factors. 

We now discuss some of the applications of the above conjecture in conjunction with Theorem 



Analog of Paturi's Result for Monomial Complexity 

A famous result of Paturi [Pat92J characterizes the approximate degree of all symmetric functions. 
Recall that the degree of a function / is the largest \ S\ such that f(S) is non-zero. Let to and t\ be 
the minimum integers such that f(i) = f(i + 1) for all i E [to, n — ti]. 

Theorem 5.2 (|Pat92J). Let f : {0, 1}" — > {±1} be a symmetric function and let to and t\ be defined as 
above. Then, deg 1 / 3 (/) = ®(^Jn(to + h)). 

Paturi's result has found numerous applications in theoretical computer science |Raz03l lBBC + 01 
IShe09l[dW08llShellL 

The monomial complexity of a Boolean function /, denoted mon(/), is the number of non-zero 
Fourier coefficients of /. The monomial complexity appears naturally in various areas of complex- 
ity theory, and it is desirable to obtain simple characterizations for natural classes of functions. An 
argument similar to the one in [BS92J shows that mon e (/) < pr||/||i for every e > 0. Combining 
this with Conjecture 15.11 and Theorem 11.11 would show that r(f) characterizes the approximate 
monomial complexity of /: 

Conjecture 5.3 (Consequence of Conjecture 15. p . For a symmetric function f : {0, 1}™ — > {±1}, 

logmon 1/3 (/) = 0*(r(/)). 
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Communication Complexity of Xor Functions 

Recall the Log Rank Conjecture mentioned in the introduction. This conjecture has an analogous 
version for the randomized communication complexity model: "Log Approximation Rank Con- 
jecture". The e-approximate rank of a matrix M is denoted by rank e (M), and is the minimum 
rank of a matrix that e approximates M. Denote by R e (i ? ) the e-error randomized communication 
complexity of F. It is known that H e (F) > log rank e / (Mp), where e' is a constant that depends on 
e and Mp is the matrix representation of F. Log Approximation Rank Conjecture states that this 
lower bound is tight: 

Conjecture 5.4 (Log Approximation Rank Conjecture). There is a universal constant c such that for 
any 2 party communication problem F, 

logrankv(M F ) < R e (F) < log c rank e /(M F ). 

The important paper of Razborov [Raz03] established this conjecture for the functions F(x,y) = 
f(x A y) where / is symmetric. In fact, Razborov showed that the quantum and classical random- 
ized communication complexities of such functions are polynomially related. Later, Shi and Zhang 
|SZ09), via a reduction to the case f(x A y), showed the quantum/classical equivalence for sym- 
metric xor functions F(x, y) = f(x © y). They show that the randomized and quantum bounded 
error communication complexities of F are both 6(r(/)), up to polylog factors. However, their 
result does not verify the Log Approximation Rank Conjecture for symmetric xor functions. 

Conjecture 15.11 along with Theorem 11.11 would verify the Log Approximation Rank Conjec- 
ture for symmetric xor functions (This follows from the protocol of Shi and Zhang [SZ09 , Propo- 
sition 3.4] for symmetric xor functions, and the facts ||Mp|| 4r . )e = 2 n ||/||i j£ and rank e (Mi7) 1//2 > 
[| Afp[|tr,e/ (1 + e )2 n -)- Furthermore, we would obtain a direct proof of the result of Shi and Zhang. 
This is very desirable since a major open problem is to understand the communication complexity 
of f(x © y) for general / (with no symmetry condition on /). There is a sentiment that this should 
be easier to tackle than f(x A y) as xor functions seem more amenable to Fourier analytic tech- 
niques. A direct proof of the result of Shi and Zhang gives more insight into the communication 
complexity of xor functions. 

Agnostically Learning Symmetric Functions 

Let C be a concept class and gi : {—1, 1}™ — > M be functions for 1 < i < s such that every / : 
{-l,l} n -»■ {-1,1} in C satisfies ||/ - iCj<7i||oc < e, for some reals q. The smallest s for 
which such g/s exist corresponds to the e-approximate rank of C. If each gi{x) is computable in 
polynomial time, then C can be agnostically learned under any distribution in time poly(n, s) and 
with accuracy e HKKMS08L 

Klivans and Sherstov [KS10J proved strong lower bounds on the approximate rank of the con- 
cept class of disjunctions {\J i£S X{ : S C [n]} and majority functions {MAj(±xi, ±X2, ■ ■ • , ±^n)} 
thereby ruled out the possibility of applying the algorithm of [KKMS08J to agnostically learning 
these concept classes. 

Theorem ll . 1 I together with Conjecture l5.1l provides additional negative results and gives strong 
lower bounds on the approximate rank of the concept class consisting of symmetric functions / 
with large r(f). 
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